How challenging is modeling of a data set?

نویسندگان

  • Davood Shamsi
  • Farinaz Koushanfar
چکیده

We introduce a novel methodology for determining the difficulty of modeling a given data set. The method utilizes formulation of modeling as an optimization problem instance that consists of an objective function and a set of constraints. The properties of the data set that could affect the quality of optimization are categorized. In large optimization problems with multiple properties that contribute to the solution quality, it is practically impossible to analytically study the effect of each property. A number of metrics for evaluating the effectiveness of the optimization on each data set are proposed. Using the well known Plackett and Burmann fast simulation methodology, for each metric, the impact of the categorized properties of the data are determined for the specified optimization method. A new approach for combining the impacts resulting from different properties on various metrics is described. The method is illustrated on distance measurement data used for estimating the locations of wireless nodes in ad-hoc networks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Development of Maximum Likelihood Estimation Approaches for Adaptive Estimation of Free Speed and Critical Density in Vehicle Freeways

The performance of many traffic control strategies depends on how much the traffic flow models have been accurately calibrated. One of the most applicable traffic flow model in traffic control and management is LWR or METANET model. Practically, key parameters in LWR model, including free flow speed and critical density, are parameterized using flow and speed measurements gathered by inductive ...

متن کامل

Using a combination of genetic algorithm and particle swarm optimization algorithm for GEMTIP modeling of spectral-induced polarization data

The generalized effective-medium theory of induced polarization (GEMTIP) is a newly developed relaxation model that incorporates the petro-physical and structural characteristics of polarizable rocks in the grain/porous scale to model their complex resistivity/conductivity spectra. The inversion of the GEMTIP relaxation model parameter from spectral-induced polarization data is a challenging is...

متن کامل

Statistical Background Modeling Based on Velocity and Orientation of Moving Objects

Background modeling is an important step in moving object detection and tracking. In this paper, we propose a new statistical approach in which, a sequence of frames are selected according to velocity and direction of some moving objects and then an initial background is modeled, based on the detection of gray pixel's value changes. To have used this sequence of frames, no estimator or distribu...

متن کامل

Data Envelopment Analysis with LINGO Modeling for Technical Educational Group of an Organization

Data Envelopment Analysis (DEA) was developed to help compare the relative performance of decision-making units. It is a non-parametric method for performing frontier analysis. It uses linear programming to estimate the efficiency of multiple decision-making units and it is commonly used in production, management and economics [3]. DEA generates an efficiency score between 0 and 1 for each unit...

متن کامل

How to Set up an Effective Food Tax?; Comment on “Food Taxes: A New Holy Grail?”

Whereas public information campaigns have failed to reverse the rising trend in obesity, economists support food taxes as they suggest they can force individuals to change their eating behavior and make the agro-food industry think more about healthy food products. Excise taxes based on the unhealthy nutrient content would be more effective since they impact more on unhealthy food products than...

متن کامل

Modeling Nonnegative Data with Clumping at Zero: A Survey

Applications in which data take nonnegative values but have a substantial proportion of values at zero occur in many disciplines. The modeling of such “clumped-at-zero” or “zero-inflated” data is challenging. We survey models that have been proposed. We consider cases in which the response for the non-zero observations is continuous and in which it is discrete. For the continuous and then the d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007